feat(redteam): add built-in red teaming support by kevmyung · Pull Request #184 · strands-agents/evals

kevmyung · 2026-03-31T23:14:30Z

Description

Adds an experimental red-teaming module (strands_evals.experimental.redteam) that lets users run multi-turn adversarial attacks against a target agent and score whether safety guardrails hold. The module composes existing Strands Evals primitives (Case, Experiment, Evaluator, ActorSimulator) rather than introducing a parallel framework.

Two-step flow

1. Generate cases. AdversarialCaseGenerator infers risk categories from the target's system prompt and tools, then generates per-category attack cases via an LLM. Custom cases can be authored directly via RedTeamCase + AttackGoal for domain-specific business rules.

2. Run evaluations. RedTeamExperiment drives a multi-turn attacker conversation against the target, captures the full conversation + tool trace, and scores it with an LLM judge.

  from strands import Agent
  from strands_evals.experimental.redteam import (
      AdversarialCaseGenerator, RedTeamExperiment,
  )

  agent = Agent(system_prompt="You are a customer service agent.", tools=[...])

  cases = AdversarialCaseGenerator(model=model).generate_cases(target=agent)
  report = RedTeamExperiment(cases=cases, target=agent).run_evaluations()
  report.display()

What ships

Pipeline: AdversarialCaseGenerator → RedTeamExperiment → AttackSuccessEvaluator → RedTeamReport.
Built-in risk categories: guideline_bypass, system_prompt_leak, harmful_content, data_exfiltration, excessive_agency. Auto-inferred from target info when not specified.
Strategy: gradual_escalation (prompt-driven) as the default. AttackStrategy ABC + PromptStrategy are in place; algorithmic strategies (Crescendo, PAIR, TAP, etc.) land in follow-up PRs (see design
doc §2.4).
Evaluator: AttackSuccessEvaluator — LLM-as-judge with a 4-anchor continuous rubric (refused / partial / substantial / full, scores 0.0–1.0) over conversation + tool trace.
Report: RedTeamReport.display() for console summary, plus attack_results(), by_risk_category(), by_strategy(), failed_cases for programmatic access.
Framework-agnostic targets: any Callable[[str], str | dict] works alongside Strands Agent (with optional trace capture via the dict shape).
Lives under experimental/ — API may change before promotion.

Related Issues

Closes #220.

Type of Change

New feature (experimental module).

Testing

hatch run prepare — 1119 passed.
e2e smoke against Bedrock targets : cases generated, multi-turn attacks executed, judge scored anchor points cleanly.
Unit tests cover: generator (mocked LLM), task runner, experiment wiring, report aggregation, evaluator prompt assembly, agent adapter, strategy registry contract.

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

poshinchen

Could you use built-in python | / list instead of typing's deprecated Union, List and so on?

kevmyung · 2026-05-01T20:27:02Z

Could you use built-in python | / list instead of typing's deprecated Union, List and so on?

Quick heads-up – fixed it in 438f9e0

yeomjiwonyeom · 2026-05-11T23:56:17Z

Created sub-issues under #177 to track P0/P1 work:

[FEATURE][P0] Red-Teaming: Core Pipeline — multi-turn attack strategies, evaluator, experiment, reporting #220 — P0: Core Pipeline (multi-turn attack strategies, evaluator, experiment, reporting)
[FEATURE][P0.5] Red-Teaming: Agent-topology attacks & skill-based red-teaming #221 — P0.5: Agent-topology attacks & skill-based red-teaming
[FEATURE][P1] Red-Teaming: Tool-graph case generation, deterministic verification, adaptive red-teaming agent #222 — P1: Tool-graph case generation, deterministic verification, adaptive red-teaming agent

This PR (#184) covers the infrastructure layer of P0. Checked items in #220 reflect what's already implemented here:

AttackStrategy ABC, RiskCategory, AttackGoal types
AttackSuccessEvaluator (0.0–1.0 continuous scoring on execution traces)
RedTeamJudgeEvaluator (binary per-metric)
red_team(agent) entry point with auto tool extraction
RedTeamReport with grouped views
Multi-turn conversation loop via ActorSimulator
1 strategy: gradual_escalation

Remaining P0 work (tracked in #220):

5 multi-turn strategies: Crescendo, Linear/PAIR, TAP/TreeJailbreaking, BadLikertJudge, SequentialBreak
RedTeamExperiment orchestrator
AdversarialActorSimulator / AdversarialCaseGenerator
Turn budget increase to 20-50

kevmyung · 2026-05-12T01:13:24Z

@poshinchen Resolved your comments in d750fe0:

Moved red-team evaluators under src/strands_evals/redteam/evaluators/
Unified attack_strategies API (accepts strings or AttackStrategy instances)
Fixed extract_tool_info to handle get_all_tools_config's dict shape
Addressed remaining nits (typing, unused symbols, dead branches, docstrings)

poshinchen · 2026-05-12T16:12:32Z

/strands review the PR

github-actions · 2026-05-12T17:09:27Z

Issue: This PR introduces a significant new public API surface (strands_evals.redteam) with multiple abstractions customers will use (red_team(), AttackStrategy, RedTeamJudgeEvaluator, AttackSuccessEvaluator, presets). Per the API Bar Raising guidelines, this likely warrants the needs-api-review label.

The PR description documents the components well, but missing from an API review perspective:

Module-level import paths (e.g., can users do from strands_evals import red_team?)
Currently the redteam module is not re-exported from strands_evals/__init__.py
Whether JAILBREAK, PROMPT_EXTRACTION, HARMFUL_CONTENT should be public constants vs. accessed via the registry

Suggestion: Add needs-api-review label and decide on the import ergonomics. At minimum, consider adding redteam to the top-level __init__.py lazy imports so users can write:

from strands_evals.redteam import red_team

and document whether this is the intended public entry point.

github-actions · 2026-05-12T17:09:33Z

Review Summary

Assessment: Comment (Request Changes on specific items)

Solid foundation for red teaming capabilities. The architecture cleanly separates concerns (presets vs. strategies vs. evaluators vs. runner) and the red_team() API provides a simple entry point. Two items I'd ask to address before merging:

Review Categories

Concurrency Safety: The shared tool_trace mutable list pattern will silently corrupt data if the experiment runs with parallel workers. This needs either a fix or an explicit max_workers=1 constraint.
Test Coverage Gaps: AttackSuccessEvaluator and agent_adapter.py are untested — both are part of the public API surface.
API Surface: This introduces a substantial new public module. Consider adding needs-api-review label and clarifying the intended import paths (top-level re-export vs. submodule).
Evaluator Aggregation: The multi-metric judge evaluator's outputs get averaged, which can mask critical safety failures. Worth a deliberate design decision on the aggregation semantics.
Reproducibility: No seed parameter for case generation makes CI/CD regression testing non-deterministic.

The separation of "what to attack" (presets) from "how to attack" (strategies) is a clean design that should scale well as more strategies land in the follow-up PRs.

github-actions · 2026-05-21T22:33:09Z

Review Summary (Round 4)

Assessment: Request Changes

All Round 3 items were addressed well. However, a correctness issue remains with the shared target Agent state across cases.

Review Details

Correctness (Critical): The target Agent's messages accumulate across all red team cases with no reset. This breaks case isolation — later cases see earlier attack conversations, and the context window will overflow on larger runs. Each case should evaluate the target independently.
Concurrency safety: run_evaluations_async inherits max_workers=10 from the base class, but the task function captures a shared mutable Agent. Parallel execution would corrupt state. Default should be max_workers=1 for the async path too.
Robustness: _infer_risk_categories doesn't guard against structured_output = None, unlike _generate_cases_for_category which does.

The first issue (agent state isolation) is the only blocker — it affects correctness of all multi-case runs. The other two are defensive improvements.

poshinchen · 2026-05-22T16:32:24Z

Also, does the experiment return list[RedTeamReport] or just single RedTeamReport?
Customers can pass multiple evaluators given an experiment and it'll generate list of report. The Base Report class has flatten method, user can just use that.

kevmyung · 2026-05-22T16:37:29Z

Also, does the experiment return list[RedTeamReport] or just single RedTeamReport? Customers can pass multiple evaluators given an experiment and it'll generate list of report. The Base Report class has flatten method, user can just use that.

Returns a single RedTeamReport. We collect list[EvaluationReport] from the base call internally and merge them case-keyed - needed for case-centric views (failed_cases, by_risk_category(), display()). Happy to switch to base flatten.

github-actions · 2026-05-22T17:09:37Z

Review Summary (Round 5)

Assessment: Comment (Approve with minor fixes)

All critical and important issues from Round 4 (agent state isolation, async max_workers, None guard) have been properly addressed. The module is in good shape.

Remaining Items

Robustness: assert for data validation in report.py:69 will be stripped by Python -O, use explicit if/raise instead.
Style: Log messages don't follow the repo's STYLE_GUIDE.md format (field=<%s> | message). Consistent across all files in the module.

Neither item is blocking. The architecture is clean, test coverage is solid (7 test files covering all major components), and the layered design properly reuses existing framework primitives.

Adds an experimental red-teaming module under src/strands_evals/experimental/redteam/ that extends Strands Evals base types (Case, Experiment, Evaluator, ActorSimulator) with adversarial counterparts. - AdversarialCaseGenerator: generates RedTeamCases per risk category, with optional auto-inference of categories from target tools/system_prompt - RedTeamExperiment: orchestrates multi-turn attacker/target conversations - AttackSuccessEvaluator: continuous 0.0-1.0 LLM-as-judge over conversation + tool execution traces - AdversarialActorSimulator: ActorSimulator subclass shared across strategies - AttackStrategy + PromptStrategy with gradual_escalation as the default

github-actions · 2026-06-01T15:30:35Z

Review Summary

Assessment: Approve (with minor suggestions)

The module has matured significantly through 5+ prior review rounds. All critical issues from earlier rounds (agent state isolation, concurrency safety, None guards, assert-for-validation) are resolved. The architecture cleanly composes existing framework primitives and the test coverage is thorough (762 lines of tests across 7 test files for 1198 lines of source).

Remaining Suggestions

Style: Log format doesn't follow STYLE_GUIDE.md pattern (field=<%s> | message) — 8 calls across task.py and generators/adversarial.py.
Defensive coding: No post-strategy.enhance() empty check — future algorithmic strategies could return empty and waste a turn.
Test coverage: Dict-returning callable target path in _call_target lacks direct unit test coverage.

None of these are blocking. The experimental/ namespace properly signals API instability, and the design (Generator → Experiment → Task → Evaluator → Report) is clean and extensible.

jjbuck

Approved with just a few non-blocking nits noted for eventual transition from experimental to main.

poshinchen

Let's iterate the action items in the follow up PRs

@AbstractMethod

* fix(redteam): align log format and cover dict-target path Carry-over nits from PR #184: - Align 8 log calls in task.py and generators/adversarial.py to the project's field=<%s> | message convention (no punctuation/capitals). - Add unit tests for the _call_target dict-target branch (with and without a trace key), which was previously untested. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(redteam): unify strategies on run_attack with strategy-agnostic cases Every AttackStrategy now owns its multi-turn loop via an abstract run_attack(case, call_target, ...) -> AttackRunResult; the task runner injects call_target (target invocation + tool-trace capture + per-case messages.clear isolation) and no longer branches on strategy type. Why: a single execution model (strategy owns its loop) is simpler than a runner-owned loop plus a per-strategy exception. Cases become strategy-agnostic (no strategy/template baked into RedTeamConfig); the RedTeamExperiment holds the strategy instances and expands the case x strategy cross-product at run time, so hand-crafted cases and strategy comparison (by label) are both first-class. - base.py: run_attack @AbstractMethod + AttackRunResult dataclass; add label (instance id, defaults to name); remove the unused enhance(). - PromptStrategy: relocate the ActorSimulator loop from task.py into run_attack (gradual_escalation behavior unchanged). - RedTeamConfig: drop strategy/system_prompt_template + their validator. - generators/adversarial: generate_cases emits strategy-agnostic cases; rename target -> agent; drop attack_strategies. - experiment: rename target -> agent; accept attack_strategies; build _by_label (duplicate label -> ValueError); expand cross-product before delegating to the base worker (left untouched). - task: build call_target, look up the case's strategy by label, map AttackRunResult to the {"output", "trajectory", ...} dict. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(redteam): add Crescendo multi-turn attack strategy CrescendoStrategy escalates gradually across turns, each attacker message building on the target's previous answer. On a refusal it backtracks by simply not appending the refused (question, response) pair and retrying with a fresh question (up to max_backtracks), so the refused turn never enters the history — a simpler equivalent of PyRIT's excluding-last-turn approach. It stops early once a turn scores at/above success_threshold. The refusal/success/question-generation helpers are module-level functions (is_refusal, success_score, gen_escalating_question) rather than methods, so future strategies (PAIR, TAP) can reuse them without importing a strategy class. They power the strategy's cheap in-loop "should I stop?" gate; success_score reads the case's success_criteria — the same input the authoritative AttackSuccessEvaluator uses — so the two never disagree on what counts as success, while the evaluator remains the sole verdict over the full trace. Parse failures degrade safely (question -> terminate preserving the conversation; judge -> score 0 and keep looping); only the evaluator raises. The attacker model resolves to the ctor model first, then the experiment model. CrescendoStrategy is exported but intentionally NOT in BUILTIN_STRATEGIES (it is user-instantiated with params). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(redteam): add per-failure drill-down to the report The aggregate sections (top-line attack-success rate, by_risk_category, by_strategy when more than one strategy ran) are unchanged. Each failure line now also shows the attacker's objective and the strategy's per-run stats (turns used, backtracks) so a multi-turn result like Crescendo is legible at a glance, not just a single score. The strategy's run metadata reaches the report by merging AttackRunResult.metadata onto the case metadata in the task function; the base Experiment shares that dict with the EvaluationData it builds, so no base change is needed. Full turn-by-turn conversation output is left for a future verbose mode. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(redteam): end-to-end wiring tests + fix strategy metadata join Cover both user paths through the full locked interface with only the LLM layer mocked (attacker, in-loop judge, and the evaluator's judge agent): - generated cases: generate_cases(agent=...) -> RedTeamExperiment with CrescendoStrategy -> run_evaluations -> RedTeamReport. - hand-crafted cases: the same pipeline from RedTeamCase objects built by hand, skipping the generator (Model B's first-class path). Live (real-Bedrock) runs surfaced a wiring bug these mock tests now guard: the strategy's run metadata (turns_used, backtracks) never reached the report. task_fn mutated case.metadata, but Pydantic copies that dict into a fresh EvaluationData, and the base Experiment doesn't carry task-returned metadata anyway. Fix: the experiment now collects each case's run metadata (keyed by case name) and joins it onto the report in RedTeamReport.from_evaluation_reports — keeping the base untouched and the collection logic on the RedTeamExperiment layer (where it stays put if the experiment later stops extending the base). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(redteam): address pre-merge review (idempotency, refusal accuracy, leaks) Adversarial self-review before opening the PR surfaced two correctness bugs and several maintainability issues; fixing them here. Correctness: - Cross-product expansion was mutating self._cases in place, so re-running an experiment squared it (c0__cre -> c0__cre__cre). _expand_cross_product is now pure (returns a new list) and run_evaluations_async swaps/restores self._cases around the base run, making reruns idempotent. - is_refusal flagged compliant text containing refusal substrings ("I cannot stress enough... here are the steps", "I apologize, here is..."), dropping successful attacks from the trace and biasing results toward "attack failed". Markers are now only a cheap negative prefilter; on a marker hit a refusal judge (the previously-unused REFUSAL_JUDGE_SYSTEM_PROMPT) disambiguates, with a safe "keep the turn" fallback on parse failure. Maintainability: - Removed the leaky AttackRunResult.trajectory field (the task owns the trace via call_target); task_fn now assembles the output/trajectory payload directly. - Unified turns_used to "turns kept in the conversation" across strategies; Crescendo additionally reports target_calls (incl. refused, backtracked calls). - Documented max_turns as an experiment-level ceiling (strategy runs min of the two), the no-success_criteria behavior, and the max_workers=1 requirement; run_evaluations_async now rejects max_workers != 1 instead of relying on a comment. - Dropped the now-unused resolve_strategy/DEFAULT_STRATEGY public surface. Tests: idempotency, refusal false-positives + judge disambiguation, all-refusal empty conversation, ctor-vs-injected max_turns both directions, no-criteria run, direct async entry + coroutine/max_workers guards; e2e now asserts exact turns_used/backtracks with an engaging (non-refusal) target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(redteam): add report.display(verbose=True) to show failure conversations An LLM judge can be fooled by a target that *claims* to leak — e.g. a target that, under escalation, emits a code block it presents as "my system prompt" which may be partly hallucinated. The aggregate report can't be verified by eye without the transcript. display(verbose=True) now prints each failed case's full attacker/target conversation (default stays the compact aggregate + one-line drill-down), so a user can confirm whether a flagged "success" is a real leak or a false positive. The conversation is carried on AttackResult.conversation (from the case's actual_output). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(redteam): strategies own their turn budget; drop experiment max_turns The experiment's max_turns (default 10) silently capped every strategy via min(strategy_max_turns, experiment_max_turns), so CrescendoStrategy(max_turns=30) under the default experiment ran only 10 turns — quietly breaking the compare-same-strategy-different-params use case. Each strategy now owns its turn budget; the task passes MAX_ALLOWED_TURNS (50) as a hard ceiling, so turn_cap = min(strategy.max_turns, 50). Removed max_turns from RedTeamExperiment.__init__ entirely. Added max_turns to PromptStrategy so gradual_escalation keeps its prior default of 10 (and its {max_turns} prompt text) rather than jumping to the ceiling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(redteam): second adversarial pass + address reviewer bot feedback - judges score each response statelessly (clear history per call) so earlier turns don't bias the in-loop refusal/success verdicts - correct backtrack docstring: it is report-scope only, the target's own context is not rolled back; add a proof test - drop dead keys from task_fn return dict (base reads only output/trajectory) - export AttackRunResult publicly (part of the strategy extension contract) - remove unused system_prompt_template from base AttackStrategy - fix log-statement separators; extract dense metadata merge into locals - add hardening cross-ref comments (lazy-init attacker, _cases swap) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(redteam): add TargetSession protocol and implementations Introduce TargetSession (invoke/snapshot/restore/supports_rewind/trace) as the handle a strategy uses to talk to the target, replacing the opaque call_target in a follow-up. AgentTargetSession wraps a strands.Agent and is rewindable via the SDK snapshot API (deep-copy rollback); CallableTargetSession wraps an opaque callable and reports supports_rewind=False. Bumps strands-agents floor to >=1.36.0 for the snapshot API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(redteam): replace call_target with TargetSession across strategies Switch run_attack from call_target: Callable[[str], str] to a TargetSession, so a strategy can roll the target back via the session's snapshot/restore. Crescendo now does a real state rollback on a refusal for rewindable (Agent) targets and degrades to report-scope backtracking for opaque callables; both keep the refused turn in AttackRunResult.pruned_branches as defended-turn evidence. The report surfaces that evidence: display() is flattened to a case x strategy matrix plus a per-attack table (every attack, breached and defended), closing the gap where a fully-defended run looked empty. Score aggregation across evaluators switches min -> max (worst-case = strongest attack). The trace is rolled back alongside messages so backtracked tool calls no longer ghost into the trajectory. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(redteam): address adversarial-review findings on TargetSession + report - Matrix now pivots on the base case name: the cross-product names work items "{case}__{strategy}", so without stripping the suffix every cell landed on its own row and the case x strategy grid was meaningless (AR-5). - Replace the snapshot.app_data["_trace_len"] mutation with an explicit TargetCheckpoint(agent_snapshot, trace_len) dataclass returned by snapshot() and consumed by restore() — no stashing internal keys on the SDK object, and trace/messages roll back together (AR-1/AR-2). - Move per-case isolation into TargetSession.reset() (clears the wrapped agent's history + trace) instead of task.py reaching into agent.messages (AR-7). Verified live against Bedrock: backtrack still rolls back (backtracks=4, blocked=4) and the 2x2 cross-product matrix renders one row per case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(redteam): second-pass review cleanups on TargetSession + report - Remove a duplicated paragraph in the crescendo module docstring. - Add `from __future__ import annotations` to target_session.py to match every sibling module and keep the TargetCheckpoint forward reference safe. - Export the session companions consistently: TargetCheckpoint joins TargetSession on the redteam facade (both are part of the strategy contract, like AttackRunResult); the two concrete impls are exported at the strategies package. - Qualify the backtrack docstring/comment to "the target's state" — the attacker agent keeps its own history (a known, separate quality limitation). - Parameterize trace annotations as list[dict[str, Any]] to match the strategies layer. - Guard the report matrix against a base-case/strategy key collision: if stripping the cross-product suffix would hide a result, fall back to full names. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(redteam): add TargetSession.trim_trace; tidy flat-table case column - Add trim_trace(length) to the TargetSession protocol so a strategy rolls the tool trace back through the session instead of mutating session.trace directly (addresses the bot review: the protocol never promised .trace returns a mutable reference, so a defensive-copy impl would have silently ghosted refused-turn tool calls). AgentTargetSession.restore now reuses trim_trace; Crescendo's non-rewindable backtrack calls it instead of `del target_session.trace[...]`. - Report flat table / transcript header show the base case name (the strategy column already disambiguates the cross-product), so the full "{case}__{strategy}" name no longer overflows the column into the risk field. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(redteam): drop callable target, harden TargetSession contract - remove CallableTargetSession; input is Agent | TargetSession (TypeError otherwise) - rename AgentTargetSession -> StrandsAgentSession - trace: property -> plain attribute; fold trim into restore() - split invoke into _send + _tool_uses_in; add ToolUseEntry TypedDict - crescendo: never backtrack a tool-call turn; stop on it (keep breach evidence) - resilient trace extraction (placeholder on malformed block keeps the gate honest) * test(redteam): use a real TargetSession in experiment wiring tests The lambda agents in test_experiment.py hit the new _build_session TypeError and passed only because the base experiment catches it as score=0 -- so the default-task and cross-product wiring was never actually exercised (run_attack was unreachable). Swap the lambdas for a _FakeSession so the intended paths run. * fix(redteam): reset target to clean baseline, not just messages StrandsAgentSession.reset() only cleared messages, but snapshot()/restore() round-trip the full session preset (messages, state, conversation_manager_state, interrupt_state). So agent state leaked across cases -- a tool writing agent.state in case N would still be set in case N+1, which can flip a later attack's outcome. The experiment now captures one clean baseline at task-build time (before the first case, while the shared agent is still as-constructed) and reset() rolls back through the same load_snapshot path restore() uses. Seeded target history is preserved (it's part of the target definition); per-case state is cleared. * test(redteam): pin baseline-reset invariants; tighten _build_session typing Follow-up to the reset fix after an adversarial review pass: - type _build_session(baseline) as Snapshot | None instead of Any (it feeds load_snapshot, so a non-Snapshot would only surface as a swallowed per-case error) - add a real-Agent test that one baseline survives repeated resets uncorrupted (the capture-once/replay-N aliasing risk), and a test locking the documented limitation that a no-baseline session does not isolate non-message state --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

kevmyung had a problem deploying to manual-approval March 31, 2026 23:14 — with GitHub Actions Failure

kevmyung had a problem deploying to manual-approval March 31, 2026 23:21 — with GitHub Actions Failure

kevmyung had a problem deploying to manual-approval April 25, 2026 00:17 — with GitHub Actions Failure

kevmyung force-pushed the feat/red-team-foundation branch from 8d7d3f5 to c9f5845 Compare April 25, 2026 04:18

kevmyung had a problem deploying to manual-approval April 25, 2026 04:19 — with GitHub Actions Failure

poshinchen reviewed May 1, 2026

View reviewed changes

kevmyung temporarily deployed to manual-approval May 1, 2026 15:23 — with GitHub Actions Inactive

poshinchen reviewed May 8, 2026

View reviewed changes

Comment thread src/strands_evals/experimental/redteam/evaluators/attack_success_evaluator.py

poshinchen reviewed May 8, 2026

View reviewed changes

Comment thread src/strands_evals/evaluators/red_team_judge_evaluator.py Outdated

poshinchen reviewed May 8, 2026

View reviewed changes

Comment thread src/strands_evals/evaluators/red_team_judge_evaluator.py Outdated

poshinchen reviewed May 11, 2026

View reviewed changes

This was referenced May 11, 2026

[FEATURE][P0] Red-Teaming: Core Pipeline — multi-turn attack strategies, evaluator, experiment, reporting #220

Closed

[FEATURE][P0.5] Red-Teaming: Agent-topology attacks & skill-based red-teaming #221

Open

kevmyung had a problem deploying to manual-approval May 12, 2026 01:06 — with GitHub Actions Failure

github-actions Bot added strands-running and removed strands-running labels May 12, 2026